Bulletpapers - Understand complex papers in seconds

April 2024

Enhancing satellite image resolution with Swin transformers

This paper proposes Swin2-MoSE, an enhanced Swin transformer model to increase satellite image resolution. It uses a Mixture-of-Experts layer to improve performance, combines positional encodings for better localization, and a combined loss to optimize image quality. Experiments show it outperforms state-of-the-art up to 0.958dB PSNR, demonstrating efficacy on semantic ...

April 2024

Key Datasets for Perception in Unstructured Outdoor Environments

This paper surveys publicly available datasets for perception tasks like semantic segmentation and object detection in unstructured outdoor environments. It focuses on categorizing datasets by task and key characteristics like image count, classes, and more. The goal is helping practitioners determine the right dataset for their application.

April 2024

Principal Mask Proposals for Unsupervised Semantic Segmentation

This paper presents a method called PriMaPs to decompose images into semantically meaningful masks using principal components of self-supervised image features. These masks serve as proposals to guide an expectation-maximization algorithm, PriMaPs-EM, to realize unsupervised semantic segmentation by fitting class prototypes. Despite simplicity, PriMaPs-EM leads to compe...

April 2024

Decoupling SAM for robotic surgery tool segmentation

This paper proposes Surgical-DeSAM, an approach that combines object detection and segmentation models to enable real-time robotic surgery tool segmentation without manual prompting. It utilizes Swin Transformer for detection and a decoupled Segment Anything Model for segmentation, outperforming prior methods on surgery datasets.

April 2024

Weakly Supervised Incremental Segmentation by Resolving Label Conflicts

This paper proposes a tendency-driven relationship of mutual exclusivity to resolve conflicting pixel-level predictions between new and old classes in weakly supervised incremental segmentation. It allows generating high-quality pseudo-masks for new classes while mitigating catastrophic forgetting of old classes.

April 2024

Semantic features for indoor scene classification

This paper proposes novel methods to represent indoor scenes using semantic information from object detection and semantic segmentation. These techniques provide spatial, shape, and layout cues to address limitations in existing deep learning approaches. A multi-branch network is developed that combines global image features, object detections, and pixel-level segmentat...

April 2024

Bird's Eye View Segmentation for Fisheye Cameras

This paper introduces a method to perform semantic segmentation in bird's eye view using images from surround-view fisheye cameras on vehicles. It creates a synthetic dataset to address the lack of public fisheye data, and handles occlusion reasoning which is missing in other synthetic sets. The method generalizes to any camera model, and outperforms baselines that use ...

April 2024

Aligning model objectives for zero-shot segmentation

This paper proposes AlignZeg, an architecture that improves zero-shot semantic segmentation by aligning the model's objectives with the goal of accurately segmenting unseen classes. It extracts detailed class-agnostic masks, enhances feature space generalizability, and corrects prediction bias towards seen classes.

April 2024

Reducing background noise in attention maps for weakly supervised segmentation

This paper proposes a method to reduce background noise in attention maps used for weakly supervised semantic segmentation. The method enhances Class Activation Maps (CAMs) with attention maps from a Conformer model, and adds noise during training to further suppress noise. Experiments show improved segmentation accuracy over prior methods on PASCAL VOC and COCO datasets.

March 2024

Hybrid learning for event camera semantic segmentation

This paper proposes a hybrid pseudo-labeling framework called HPL-ESS to improve unsupervised semantic segmentation for event cameras. Event cameras have advantages in high-speed motion and challenging lighting. But labeling event data is difficult, so past approaches use event-to-image reconstruction for pseudo-labels. However, reconstruction introduces noise that gets...

March 2024

Efficient multi-target domain adaptation for semantic segmentation

This paper proposes an efficient framework called OurDB for multi-target domain adaptation in semantic segmentation, using only a single teacher model. It cycles through target domains, aligning them individually to avoid biased alignment. It also prevents forgetting previous domains and leverages context to handle varying target contexts.

March 2024

Grouping Points for Semantic-Aware 3D Representation Learning

This paper proposes GroupContrast, a self-supervised framework to learn effective 3D representations by combining segment grouping and semantic-aware contrastive learning. Segment grouping partitions unlabeled point clouds into semantically coherent regions to provide guidance for subsequent contrastive learning. By constructing positive pairs within groups and negative...

March 2024

Monocular 3D Semantic Scene Completion

This paper proposes MonoOcc, an approach to infer complete 3D semantic geometry from a single image. It refines an existing monocular framework with auxiliary supervision and cross-attention with image features. It also employs distillation to transfer knowledge from a privileged multi-view branch for efficiency. MonoOcc achieves state-of-the-art camera-based scene comp...

March 2024

Context-aware learning for semantic segmentation

This paper proposes a context prototype-aware learning strategy to improve weakly supervised semantic segmentation models. It argues that there is a knowledge bias between individual instances and global contexts, which limits a model's ability to fully capture details of each instance. The method enhances prototype learning to better represent diverse spatial and appea...

March 2024

Point cloud semantic features for 3D object detection

This paper proposes a new representation for LiDAR-only 3D object detectors by integrating segmented point clouds from a 3D semantic segmentation model. This provides supplementary semantic information while preserving the detectors' geometric data. Experiments show performance gains over prior state-of-the-art multi-modal detectors on the KITTI benchmark, especially fo...

March 2024

Exploring lower-density regions to improve semi-supervised semantic segmentation

This paper proposes a new method called Density-Descending Feature Perturbation (DDFP) to improve semi-supervised semantic segmentation. It is based on the idea that shifting features towards lower-density regions can help the model explore and find better decision boundaries. A lightweight density estimator using normalizing flows is introduced to capture feature densi...

March 2024

Extensive data augmentation for domain adaptation

This paper proposes a data augmentation strategy called ECAP to improve unsupervised domain adaptation for semantic segmentation. ECAP maintains a memory bank of target domain pseudo-labels over training. It selects the most confident pseudo-labels and cut-and-pastes them onto source domain images to augment the training data. This leverages reliable pseudo-labels and r...

March 2024

Generalizable semantic neural radiance fields

This paper introduces a Generalizable Semantic Neural Radiance Field (GSNeRF) model that can synthesize novel view images and associated semantic maps for unseen scenes. The model has two key stages - Semantic Geo-Reasoning to extract geometry and semantic features from input views, and Depth-Guided Visual Rendering to efficiently sample points using predicted depth to ...

March 2024

Semantic LiDAR Odometry for Fast Moving Vehicles

This paper proposes a framework for real-time LiDAR odometry and mapping for fast moving autonomous vehicles. It utilizes deep learning for semantic segmentation to improve the accuracy of matching between LiDAR scans. A novel algorithm also detects and rejects outlier matches between different objects of the same semantic class. Experiments on the KITTI dataset demonst...

March 2024

Dual-domain image fusion for remote sensing semantic segmentation

This paper proposes a hybrid training strategy and dual-domain image fusion approach to improve semantic segmentation of remote sensing images using unlabeled target domain data. It fuses original and transformed images to create intermediate representations, enhances pseudo-labels through region-based weighting, and demonstrates significant performance gains on public ...

February 2024

Efficient attention model for scene parsing

This paper presents a network for semantic scene segmentation that uses attention techniques to select the most useful features across channels and spatial locations. It gathers features from multiple network depths to capture multi-scale context, and applies channel and spatial attention modules to identify the most relevant features. An auxiliary task helps learn glob...

February 2024

Decoupling co-occurring objects in weakly supervised segmentation

This paper proposes a method called SeCo to tackle the common issue in weakly supervised semantic segmentation where co-occurring objects in images lead to false activations. It separates co-occurring objects by dividing images into patches and assigning category tags. Then it enhances semantic representation to promote discrepancy between co-categories. Experiments val...

February 2024

Realistically inserting objects to test anomaly detection

The paper proposes a method to realistically add arbitrary objects into images to create datasets for evaluating anomaly detection in semantic segmentation models. It allows dynamically extending datasets to assess risk when deploying models.

February 2024

Video segmentation adaptation

This paper evaluates video domain adaptation techniques for semantic segmentation, which aim to adapt models trained on labeled simulated videos to unlabeled real-world dashcam videos. It finds that standard image domain adaptation methods substantially outperform specialized video techniques, achieving state-of-the-art results. A code library is provided to facilitate ...

January 2024

Efficient Tuning of Segment Anything Model

This paper introduces Conv-LoRA, an effective yet parameter-efficient fine-tuning method to adapt the Segment Anything Model (SAM) for specialized segmentation tasks. Conv-LoRA integrates lightweight convolutions into SAM's encoder using Low-Rank Adaptation, reinforcing built-in inductive biases and reviving SAM's ability to learn high-level semantics constrained by pre...

January 2024

Efficient open-vocabulary object detection with YOLO-World

The authors propose YOLO-World, which enhances YOLO detectors with open-vocabulary detection capabilities. They introduce a Re-parameterizable Vision-Language Path Aggregation Network and region-text contrastive loss to facilitate visual-linguistic interaction. In experiments, YOLO-World achieves high accuracy and speed for zero-shot detection on LVIS, outperforming sta...

January 2024

Simple open-vocabulary image segmentation

This paper introduces S-Seg, a novel model for open-vocabulary semantic image segmentation. S-Seg trains a MaskFormer model using pseudo-masks for supervision and an image-text contrastive loss for language alignment, without relying on pretrained vision-language models or manual annotations. Once trained, it generalizes well to unseen classes and datasets. Benefits inc...

January 2024

One model for all segmentation tasks

The authors propose OMG-Seg, a single transformer-based model that can perform over 10 different image and video segmentation tasks effectively, including semantic, instance, panoptic, open-vocabulary, interactive, and video object segmentation. It uses a shared encoder-decoder architecture to reduce overhead.

January 2024

Document layout analysis dataset for ancient manuscripts

The paper introduces a new dataset for document layout analysis of ancient manuscripts, developed through collaboration between computer scientists and humanities experts. It provides pixel-precise, non-overlapping ground truth segmentations for 6 semantic classes, and aims to meet the needs of both fields more effectively than prior datasets. A computer-aided segmentat...

December 2023

Efficient semantic segmentation with a single CNN

This paper proposes SCTNet, a novel approach to semantic segmentation that utilizes a training-only transformer branch to provide rich semantic information to a lightweight single-branch CNN. This allows the CNN to generate accurate segmentation masks at high inference speeds, without needing a costly additional branch. Extensive experiments show state-of-the-art perfor...

December 2023

Improving vision-language alignment via automatic tag parsing

This paper proposes an approach to improve alignment between image and text features in vision-language models, without needing additional supervision. It automatically parses objects and attributes from image captions, using them as supervision to train the model via a multi-tag classification loss alongside the image-text contrastive loss. This boosts performance on s...

December 2023

Stronger segmentation with descriptive properties

This paper proposes using descriptive properties from large language models, instead of one-hot category labels, to supervise segmentation models. Properties are clustered into an interpretable label space. This enhances model performance, scalability and generalization ability.

December 2023

Weak Supervision for Semantic Segmentation in Driving Scenes

This paper develops a new weakly-supervised semantic segmentation framework tailored for driving scene datasets. It analyzes dataset characteristics and uses CLIP to generate pseudo-masks. To handle small objects missed by CLIP, it incorporates small-scale patches during training. To reduce noise, it discerns reliable vs noisy regions based on prediction consistency, su...

December 2023

Superpoint grouping for indoor anchor-free 3D object detection

The paper proposes a novel superpoint grouping network for indoor anchor-free one-stage 3D object detection. It partitions raw point clouds into superpoints with semantic consistency and spatial similarity. A geometry-aware voting module constrains relationships between superpoints and object centers to adapt to anchor-free detection. A superpoint-based grouping module ...

December 2023

Self-supervised multi-camera 3D scene reconstruction

This paper proposes OccNeRF, a method to predict 3D occupancy and geometry from multi-camera images in a self-supervised fashion, without ground truth 3D or 2D labels. It handles unbounded scenes by parameterizing occupancy fields. Multi-frame photometric consistency provides supervision, aided by an open-vocabulary segmentation model for semantics.

December 2023

Self-guided semantic image segmentation

The authors propose a novel framework called Self-Seg that can automatically detect and segment objects in images without needing any textual input specifying class names. Self-Seg uses a vision-language model called BLIP to cluster image regions and generate captions describing each cluster. These captions are filtered to extract noun class names, which guide a semanti...

November 2023

Lightweight clustering for semantic segmentation

This paper proposes a lightweight clustering framework to perform semantic segmentation without labels. It utilizes attention features from self-supervised vision transformers, which have strong foreground/background differences. These features are clustered into groups at the dataset, category, and image levels. Consistency across levels extracts high-quality binary ps...

November 2023

Test-time adaptation via generative feedback

This paper proposes Diffusion-TTA, a method to adapt image classifiers, segmenters and depth predictors to individual test images using feedback from a generative diffusion model. The key idea is to modulate the conditioning of the diffusion model using the output of the discriminative model, and maximize image likelihood by updating parameters of both models. Experimen...

November 2023

Fusing camera and lidar data over time to detect occluded vehicles

This paper proposes a new approach called TLCFuse that leverages sequences of camera images, lidar data, and motion information over time to create accurate overhead semantic maps of driving scenes, even when objects are occluded. The method uses attention mechanisms to encode spatial relationships and temporal context into a compact representation. Experiments on a lar...

November 2023

Object detection and segmentation from partially labeled images

This paper studies how to train object detection and semantic segmentation together, when each training image only has labels for one task. They find that alternating optimization between tasks helps both tasks, even without shared labels per image. Knowledge distillation further improves results by transferring information between task networks. Overall, multi-task lea...

November 2023

Leveraging point annotations for segmentation

This paper investigates using intensity-based distance maps with boundary loss for point-supervised semantic segmentation. Boundary loss penalizes false positives farther from the object more, so seems unsuitable for weak supervision where some false positives are needed. But intensity-aware distances like geodesic and minimum barrier distances may allow some false posi...

November 2023

Visual prompts boost transformers for scene understanding

This paper proposes a novel method to improve multi-task dense scene understanding using visual prompts. The model uses task-specific learnable prompt tokens that interact with image tokens in a transformer encoder. This allows the model to learn specialized representations tailored to each task. The method achieved state-of-the-art results on semantic segmentation, dep...

November 2023

Efficient image communication for AIoT using deep semantic segmentation and restoration

This paper proposes a novel deep image semantic communication model to enable efficient image transmission for Artificial Intelligent Internet of Things (AIoT) devices. At the transmitter, a high-precision semantic segmentation algorithm extracts key image semantics, significantly compressing data. At the receiver, a Generative Adversarial Network (GAN) restores the sem...

November 2023

Using vision models for 3D point cloud segmentation

This paper explores adapting large pretrained 2D vision models like CLIP and SAM to perform semantic segmentation on 3D point clouds. The authors make 2D predictions then project into 3D, combining results via a label fusion strategy. Experiments on ScanNet assess zero-shot learning and using sparse 2D supervision. Overall, it shows promise in transferring 2D vision mod...

November 2023

Using deep learning to separate overlapping handwritten letters

This paper explores using deep learning and semantic segmentation to identify and isolate individual overlapping handwritten letters from historical manuscripts. The authors tested convolutional neural network models on simulated datasets of overlapping letters. They achieved promising results, showing these AI methods can help make faded, erased, or overwritten texts m...

November 2023

Automated overhead line inspection using semantic segmentation

This paper proposes a new framework for automated inspection of overhead power lines using images. It uses a Faster R-CNN model to identify equipment in images, then an unsupervised semantic segmentation algorithm separates the equipment from the background. Defects are detected by comparing the segmented equipment image to a normal reference image. This focuses on reco...

October 2023

Test-time adaptation for semantic segmentation

This paper proposes methods for test-time adaptation of semantic segmentation models on synthetic driving videos with changing weather and lighting conditions. The goal is to adapt a pre-trained model at test time as the distribution shifts, without forgetting the original domain. The methods are evaluated on sequences that gradually drift away then back to the source d...

October 2023

Land cover mapping with aerial and satellite imagery

This paper introduces FLAIR, a large multi-sensor dataset for land cover mapping. It combines very high resolution aerial images with lower resolution satellite image time series. This allows models to leverage both detailed spatial information and temporal dynamics of land cover. The dataset spans diverse regions of France and provides over 20 billion annotated pixels....

October 2023

Weakly-supervised semantic segmentation with image labels

This paper reviews traditional and foundation model methods for weakly-supervised semantic segmentation, which uses only image labels rather than pixel masks. Traditional methods refine initial coarse masks to expand object regions. Foundation models like SAM show high potential, producing masks surpassing human annotations.

October 2023

Intervention-driven pixel relation modeling

This paper proposes a new approach called IDRNet that models relationships between pixels in an image to improve semantic segmentation. It simplifies pixel-level relation modeling to object-level relation modeling by grouping pixels into semantic representations. The key idea is using 'deletion diagnostics' to build a relation matrix between these representations, inste...

October 2023

Efficient vision transformers for semantic segmentation

This paper proposes a method to transfer knowledge from large convolutional neural networks to compact vision transformer models for semantic segmentation. It introduces techniques to align heterogeneous representations and reduce the impact of teacher errors.

October 2023

Classic test-time adaptation methods fail for semantic segmentation

This paper systematically investigates classic test-time adaptation (TTA) methods for semantic segmentation. Through extensive experiments, it finds that techniques effective for classification TTA, like batch norm updating and teacher-student schemes, do not work well for segmentation. Key challenges are inaccurate distribution estimation and long-tailed class imbalanc...

October 2023

Semantic segmentation for road images

This paper proposes a new deep learning model for semantic segmentation of road images, which is useful for tasks like crack detection in intelligent transportation systems. It combines a convolutional neural network architecture with adversarial learning. The model integrates a generative adversarial network framework into a traditional semantic segmentation model. Thi...

October 2023

Real-time open-vocabulary 3D mapping

This paper presents Open-Fusion, a new approach for real-time open-vocabulary 3D mapping and queryable scene representation from RGB-D data. It uses a vision-language model called SEEM to extract region-based features from images. These features are integrated with 3D geometry from Truncated Signed Distance Function (TSDF) reconstruction using an efficient matching algo...

October 2023

Radar Semantic Segmentation for Autonomous Vehicles

This paper proposes a new method called TransRadar for semantic segmentation of radar data to understand driving scenes. It uses a novel attention architecture and loss functions tailored for radar data's noise and class imbalance. TransRadar achieved state-of-the-art results on benchmark datasets while using a compact model.

October 2023

Cross layer refinement for lane detection

This paper proposes a deep learning model called CLRNet that uses both high-level semantic features and low-level fine details to precisely detect lane markings in diverse conditions. The method refines coarse lane localization iteratively using contextual information. This achieves state-of-the-art performance on major benchmarks.

October 2023

Learning consistent visual features from posed images

This paper proposes a method to learn visual features that are consistent across different viewpoints of the same real-world scene. The key idea is to frame the problem as image patch retrieval - retrieving all image patches that map to the same 3D location. A ranking-based loss encourages features to be viewpoint invariant within a spatial tolerance, producing multi-sc...

September 2023

Informative mining for one-shot semantic segmentation

This paper proposes a new method called Informative Data Mining (IDM) to enable efficient one-shot domain adaptation for semantic segmentation. IDM provides an uncertainty-based criterion to select the most informative samples for training, reducing redundant optimization. It also performs model adaptation using patch-wise mixing and prototype-based maximization to alle...

September 2023

Learning segmentation from image-text pairs

This paper proposes a method to learn semantic segmentation models from only image and caption pairs, without needing costly pixel-level mask annotations. It identifies an issue where captions often miss visual concepts in images, hindering learning. The authors design a pipeline to expand captions with relevant concepts using an image retrieval process and vision-langu...

September 2023

Guiding unsupervised segmentation with depth and sampling

This paper proposes a method to improve unsupervised semantic segmentation by incorporating depth information. It guides the model's feature learning using depth-feature correlation, aligning distances in depth and feature space. It also uses depth for informed feature sampling. Evaluated on COCO, Cityscapes and Potsdam datasets, the method achieves state-of-the-art per...

September 2023

Predicting indoor surroundings from sound

This paper proposes a method to predict depth, semantic segmentation, and 3D structure of indoor environments using only binaural audio input. It introduces a cross-modal distillation framework called Spatial Alignment via Matching (SAM) to align audio and visual features and transfer knowledge from visual to audio models.

September 2023

Synthetic Amodal Perception for Autonomous Driving

This paper introduces AmodalSynthDrive, a large-scale synthetic dataset for amodal scene understanding tasks relevant to autonomous driving. The dataset contains diverse driving scenarios with multiple camera views, LiDAR, odometry, and meticulous pixel-level annotations. It supports benchmarking for fundamental amodal perception tasks like amodal semantic segmentation ...

August 2023

Self-supervised semantic segmentation from medical images

This paper proposes a novel self-supervised learning approach called S3-Net for performing semantic segmentation of medical images without requiring manual annotations. The method uses specialized network modules and loss functions to learn feature representations directly from the images themselves. Key aspects include capturing both local and global context, handling ...

August 2023

Learning to control video compression for deep vision models

This paper proposes an end-to-end learnable approach to control standard video codecs like H.264 for optimizing downstream performance of deep vision models under dynamic bandwidth constraints. A lightweight deep neural network is trained to predict high-dimensional codec parameters from video content and bandwidth conditions to maximize vision model performance while m...

August 2023

Learning bird's-eye-view scene understanding from camera images

This paper presents a semi-supervised framework to improve bird's-eye-view semantic segmentation for self-driving vehicles by utilizing unlabeled camera images during training. The method uses consistency losses on model predictions for unlabeled data to improve generalization. It also introduces a novel 'conjoint rotation' data augmentation technique.

August 2023

Self-supervised object discovery in videos

This paper proposes a self-supervised framework to identify and segment multiple object instances in videos, without relying on human annotations. It jointly leverages high-level semantics and low-level temporal correspondence cues to effectively decompose foreground objects. The model first extracts per-frame visual features and calculates dense feature correlation bet...

August 2023

Learning to segment objects without class labels

This paper introduces a new approach for segmenting objects in images without relying on class labels during training. The model is trained on a limited set of base classes, then evaluated on its ability to segment novel, unseen classes not used in training. The key ideas are using an IoU prediction head instead of classification, and contrastive learning to distinguish...

August 2023

Few-shot semantic segmentation for self-driving cars

This paper proposes a method to incrementally train a semantic segmentation model for self-driving cars, using only a few labeled examples of new classes. It uses a scene embedding to find unlabeled images similar to the few labeled examples, and applies pseudo-labeling on them. Knowledge distillation retains prior knowledge. Experiments show improvements over baseline ...

July 2023

Nighttime semantic segmentation with images and events

This paper proposes a new method called Cross-Modality Domain Adaptation (CMDA) to perform nighttime semantic segmentation using both images and events from event cameras. Event cameras have high dynamic range and can capture details in low light better than conventional cameras. The key ideas are using an Image Motion-Extractor to simulate motion from images, an Image ...

July 2023

Point cloud segmentation using convolution and attention

This paper proposes a new neural network architecture called pCTFusion that combines convolution and self-attention mechanisms to improve segmentation of outdoor LiDAR point clouds. The approach fuses multi-scale convolution features and employs local and global self-attention blocks based on encoder position to capture local and global context. A novel loss function as...

July 2023

Incremental learning of semantic segmentation

This paper proposes a method for semantic segmentation models to incrementally learn to segment new object classes over time, without forgetting prior knowledge or requiring access to old training data. The key ideas are balancing the pace of forgetting across classes, maintaining semantic relationships, and generating high-quality pseudo-labels.

July 2023

Deforestation detection using satellite imagery

This paper presents a method to detect deforestation in the Amazon rainforest using satellite imagery from Landsat-8 and Sentinel-1. The method uses an attention-guided UNet architecture to perform semantic segmentation on the images and identify deforested areas. Models were trained individually on the optical and radar data. The model achieved good accuracy on a test ...

July 2023

Spherical CNNs with Feature Pyramids

This paper introduces a new architecture for semantic segmentation of spherical images. It builds spherical feature pyramid networks (S2FPNs) to leverage multi-scale features, inspired by successful planar image segmentation models. The authors use graph-based spherical CNNs, representing the image on an icosahedral mesh rather than a planar grid. This avoids distortion...

July 2023

3D Occupancy Prediction from Camera Images

This paper proposes FB-OCC, a method for predicting the occupancy and semantic class of voxels in a 3D space using only camera images as input. It builds on FB-BEV, which uses forward and backward view transformations to generate bird's eye view features. FB-OCC adds optimizations like joint depth-semantic pretraining, scaling up model size, and postprocessing to achiev...

July 2022

Deep Learning for Automated Strawberry Farming

This paper proposes using deep learning for automated strawberry farming, specifically to detect and classify strawberry trusses and runners in images. Trusses bear the flowers and fruit, while runners are horizontal stems that can root to produce new plants. Detecting trusses and runners is an important step towards automating pruning and harvesting. The authors create...

April 2022

Transforming Research into Engaging Science: A Computer Vision Paper Made Accessible

This paper proposes a new approach to lifting perspective representations from frontal view images to bird's eye view for autonomous driving. The method combines geometric and global spatial transformations to accurately map road scenes. The approach achieves state-of-the-art performance on public benchmarks.

May 2018

Demystifying Deep Learning: An Accessible Exploration of Neural Networks

This paper provides a high-level yet comprehensive overview of deep learning and neural networks. It introduces key concepts like artificial neurons, activation functions, and network architectures in clear, everyday language. Through intuitive explanations and engaging examples, the paper makes these advanced techniques understandable and accessible to a broad audience.

The history of semantic segmentation